MIPS bacterial genomes functional annotation benchmark dataset

نویسندگان

  • Igor V. Tetko
  • Barbara Brauner
  • Irmtraud Dunger
  • Goar Frishman
  • Corinna Montrone
  • Gisela Fobo
  • Andreas Ruepp
  • Alexey V. Antonov
  • Dimitrij Surmeli
  • Hans-Werner Mewes
چکیده

MOTIVATION Any development of new methods for automatic functional annotation of proteins according to their sequences requires high-quality data (as benchmark) as well as tedious preparatory work to generate sequence parameters required as input data for the machine learning methods. Different program settings and incompatible protocols make a comparison of the analyzed methods difficult. RESULTS The MIPS Bacterial Functional Annotation Benchmark dataset (MIPS-BFAB) is a new, high-quality resource comprising four bacterial genomes manually annotated according to the MIPS functional catalogue (FunCat). These resources include precalculated sequence parameters, such as sequence similarity scores, InterPro domain composition and other parameters that could be used to develop and benchmark methods for functional annotation of bacterial protein sequences. These data are provided in XML format and can be used by scientists who are not necessarily experts in genome annotation. AVAILABILITY BFAB is available at http://mips.gsf.de/proj/bfab

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Phylogeny-Based Benchmarking Test for Orthology Inference Reveals the Limitations of Function-Based Validation

Accurate orthology prediction is crucial for many applications in the post-genomic era. The lack of broadly accepted benchmark tests precludes a comprehensive analysis of orthology inference. So far, functional annotation between orthologs serves as a performance proxy. However, this violates the fundamental principle of orthology as an evolutionary definition, while it is often not applicable ...

متن کامل

FunCat functional inference with belief propagation and feature integration

Pairwise comparison of sequence data is intensively used for automated functional protein annotation, while graphical models emerge as promising candidates for an integration of various heterogeneous features. We designed a model, termed hRMN that integrates different genomic features and implemented a variant of belief propagation for functional annotation transfer. hRMN allows the assignment ...

متن کامل

Beyond the "best" match: machine learning annotation of protein sequences by integration of different sources of information

MOTIVATION Accurate automatic assignment of protein functions remains a challenge for genome annotation. We have developed and compared the automatic annotation of four bacterial genomes employing a 5-fold cross-validation procedure and several machine learning methods. RESULTS The analyzed genomes were manually annotated with FunCat categories in MIPS providing a gold standard. Features desc...

متن کامل

A survey of bacterial insertion sequences using IScan

Bacterial insertion sequences (ISs) are the simplest kinds of bacterial mobile DNA. Evolutionary studies need consistent IS annotation across many different genomes. We have developed an open-source software package, IScan, to identify bacterial ISs and their sequence elements--inverted and target direct repeats--in multiple genomes using multiple flexible search parameters. We applied IScan to...

متن کامل

MIPS: analysis and annotation of proteins from whole genomes

The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the sc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 21 10  شماره 

صفحات  -

تاریخ انتشار 2005